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Congratulations to Professors Szekely and Rizzo for such an exciting and 
enjoyable contribution. It is not often that one of our most basic techniques 
is given so fundamental, and so successful, a rethinking. Although using dis- 
tance covariance requires giving up some useful properties associated with 
linearity — directionality/sign, exact expressions for the variance and covari- 
ance of sums, direct connection to the multivariate normal distribution — it 
offers useful properties in exchange. Distance covariance gives a true indi- 
cator of independence even for non-normal distributions, applies directly in 
multivariate settings (even when "p ^ n"), is the basis for general and pow- 
erful tests, can be adapted to use ranks, provides conditions for central limit 
theorems, and is straightforward to compute. That seems to be a favorable 
trade. In this discussion I will focus on the meaning of Brownian covariance, 
but first I want to raise a few questions to the authors (and the field). 

The paper adapts the statistic in examples to derive resampling tech- 
niques and tests for nonlinearity and extends the covariance definition in 
several ways. Perhaps the authors can comment on how general these de- 
rived techniques are. For instance, what additional conditions, if any, are 
required for the test of nonlinearity in Example 6 (based on dCov{X, (I — 
X {X"^ X)~^ X'^)Y)) to be consistent? Also, the computations would appear 
to be O(n^), which can be burdensome for very large n. Are there speed-ups 
or approximations that yield comparable results more quickly? And are rates 
of convergence available for the empirical statistics, perhaps under stronger 
moment conditions? 

But these are details. Even though the Pearson correlation is entrenched 
in the practice of several fields, including our own, what reason do we have 
not to aggressively introduce distance covariance and correlation into our 
practice and our teaching, even at the introductory level? It is rare in prac- 
tice that we want a measure of linear association per se, more typically we 
use Pearson correlation as a proxy. Distance covariance provides most of 
what we do want in these cases with attendant theory and convenience that 



This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Applied Statistics, 

2009, Vol. 3, No. 4, 1299-1302. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 C. R. GENOVESE 

is hard to beat. And teaching about the difference between "uncorrelated" 
and "independent" is a thorn in the side of anyone who has had to do so. 
Distance covariance would require no more sophisticated ideas than what we 
aheady use in teaching correlation, without that complication. The statis- 
tic is expressed in terms of distances which are easy to understand, and it 
would free us from undue emphasis on Normal examples. It is interesting to 
ponder what it would take to change practice at this level. 

However, what principally distinguishes this paper from Szekely, Rizzo 
and Bakirov (2007) is the introduction of Brownian covariance. Because of 
the "suprising coincidence" that Brownian covariance equals distance co- 
variance, though under slightly more restrictive conditions, Brownian co- 
variance may appear to be merely an interesting, if abstract, representation. 
But Brownian covariance can help us understand how and why distance co- 
variance works and how it can be generalized to obtain measures with other 
desirable properties. As the authors write, Brownian covariance "measures 
the degree of all kinds of possible relationships between two real- valued ran- 
dom variables." Because it may not be obvious why this statement is true, 
my goal here is to explain it in a different way and to offer insight into 
what the Brownian covariance means and how it can be usefully general- 
ized. I will do this by studying the (JJ, l^)-covariance (Definition 5 in the 
paper) for a special class of stochastic processes. To keep the focus on the 
ideas rather than details, I will consider only a simple case here, and I will 
play somewhat loose with regularity conditions (e.g., limit interchanges), 
but all of this can be made rigorous and general without excessive effort or 
conditions. 

Let X and Y be scalar random variables with finite second moments. 
Denote their joint density by gx,Y and marginal densities by gx and gy- 
For simplicity, assume that these densities are square integrable and have 
support on [0, 1]^ and [0, 1] respectively, although these restrictions can easily 
be weakened. Let {(pi) and {tpj) be two sequences of deterministic functions. 
They may be finite or infinite collections and need not be orthogonal. Define 



(1) Aj = J j {4>i® i'j) {gxy -gx® qy) 

(2) ^ // ^'^^ ® ^^-^ ^^'^ ~ j '^'^^ j 

(3) =Qoy{4>i{X),^,{Y)) 

(4) =^X^Y^^. 



Now consider stochastic processes U and V that can be written as series ex- 
pansions with Normal coefficients. Specifically, given suitable positive values 
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CTj and Tj, define 

(5) U{s) = Y,^iZiMs), 

i 

(6) V{t) = Y,rjZ'^i'3{t), 

j 

where the Zj's and Zj's are independent standard Normal random variables. 

Using the notation of the paper and interchanging expectation and sums 
in the definitions of Xjj = U (X) — E(C/(X) | U) and related random variables, 
we have 
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It follows from Definition 5 in the paper that 

CoyfjyiX, Y) = E{XuX[jYvY{r) 
(7) = Yl m^ZkmZ'^Z^)E{X^Y^^)EiX^^Y^^) 

i,j,k,£ 

Equation (7) shows that Covuy{X,Y) = if and only if every Aij = 0. For 
this covariance to determine independence, we must have that all Aij = 
if and only if X and Y are independent. A sufficient condition for this is 
that the functions <j)i <8> ipj form a (Schauder) basis for a class of functions 
containing gx,Y — 9x ® gv (e.g., C^)- Note that in this case 



(8) {fx,Y- fx®fY){s,t) = j j e'^'''+'y\gx,Y-9x®9Y){x,y)dxdy 

(9) = j j e*(^^+*^)5]^.j0.(x)Vj(2/)dxdy 

(10) =Y,A^M^)^Ai)^ 
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where the /'s are the characteristic functions of the g's as in the paper and 
the (p's and ip^s are the corresponding Fourier transforms. This shows that 
the covariance is related to a "norm" of fx,Y — fx ® fv and thus highhghts 
the connection to the distance covariance as defined in the paper. 

Now it is well known (the Levy-Ciesielski construction) that Brownian 
motion can be written as 

(11) Wt = Y,ZiSi{t), 

where Si is the ith Schauder function obtained by Si{t) = Hi for the corre- 
sponding function Hi in the Haar system.^ The expansion (11) corresponds 
to U = W and V = W with = = 1 and 4>k = V'fe equal to corresponding 
Schauder functions for all k. Hence, 

(12) CovH^(X,y)= 

the Frobenius norm of the infinite-order matrix A. Because the Schauder 
functions form a (nonorthogonal) basis for the set of continuous functions 
on an interval (in sup- norm) and for the spaces for 1 < p < oo, we can 
see that a zero Brownian covariance is equivalent to independence of X and 
Y. Because the Schauder functions have support in nested (and shrinking) 
dyadic intervals, Afj measures the dependence in gxY over a small dyadic 
rectangle. The Brownian covariance thus combines measures of dependence 
across all scales in a multi-resolution hierarchy, and this is the sense in 
which it captures all kinds of dependence. This derivation also clarifies how 
changing the stochastic processes U and V can give covariance measures 
that emphasize different features of X and Y^s joint distribution. 
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^This construction is usually shown for t £ [0, 1] but can be extended recursively; for 
instance, iit€ (1, 2], define Wt = Wi + X]i>o Si{t — 1) with independent Z^'s. Using the 
full line would require a slightly more general form of equation (7), which is straightforward 
to derive. 



